High-Performance Matrix Multiplication

نویسنده

  • Nelson H. F. Beebe
چکیده

This document describes techniques for speeding up matrix multiplication on some high-performance computer architectures, including the IBM RS-6000, the IBM 3090/600S-VF, the MIPS RC3240 and RC6280, the Stardent 3040, and the Sun SPARCstation. The methods illustrate general principles that can be applied to the inner loops of scientific code.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

On Composing Matrix Multiplication from Kernels

Matrix multiplication is often treated as a basic unit of computation in terms of which other operations are implemented, yielding high performance. In this paper initial evidence is provided that there is a benefit gained when lower level kernels, from which matrix multiplication is composed, are exposed. In particular it is shown that matrix multiplication itself can be coded at a high level ...

متن کامل

High-Performance Matrix-Vector Multiplication on the GPU

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrixvector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-...

متن کامل

A High Performance Parallel Strassen Implementation

In this paper, we give what we believe to be the rst high performance parallel implementation of Strassen's algorithm for matrix multiplication. We show how under restricted conditions, this algorithm can be implemented plug compatible with standard parallel matrix multiplication algorithms. Results obtained on a large Intel Paragon system show a 10-20% reduction in execution time compared to w...

متن کامل

An Optimized Matrix Multiplication on ARMv7 Architecture

A sufficiently optimized matrix multiplication on embedded systems can facilitate data processing in high performance mobile measuring equipment since plenty of the kernel mathematical algorithms are based on matrix multiplication. In this paper, we propose a matrix multiplication specially optimized for ARMv7 architecture. The performance-critical differences between ARMv7 and conventional des...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990